Zhining Sun
UEP239 Geospatial Programing Final Project
STEM workers usually refer to workers in the field of natural science, math, engineering and technology. Currently, in the United States, STEM workers takes account of 6.2% of total US employment. This number is expected to increase due to the important role played by STEM workers in growing the economy and staying globally competitive. Among STEM workers, proportion of foreign-born workers keeps rising too. Comparing with native-born workers, foreign-borns workers are likely to have relatively less information but higher demand on renting. Their needs should be paid attention. This project is designed to answer the research question: what is suitable areas for foreign-born STEM workers? The tool used by this project is suitability analysis.
#To start with the analysis, we first have to import libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
import folium
import seaborn as sns
import rasterio
from rasterio.plot import show
from rasterio import features
import geopandas as gpd
import richdem as rd
from scipy import ndimage
from rasterstats import zonal_stats
import osmnx as ox
import networkx as nx
from geopy.geocoders import Nominatim
from pyproj import CRS
from shapely.geometry import LineString, Point
#Before we start the analysis, we need to check the work dictionary
import os
os.getcwd()
'/Users/zhiningsun/Documents/GitHub/UEP239-FinalProject'
To start with the analyis, we will extract all ZCTAs that are within the study area. We will extract boundaries, clip outline with ZCTA and then do the extraction as suggested in the Pizza.
#Note that the area for this project is Boston Metropolitan Area.
#We want to read in boundaries
Boundaries = gpd.read_file('Data/MPO/MPO_Boundaries.shp')
Boundaries.head(2)
| OBJECTID | MPO | created_us | created_da | last_edite | last_edi_1 | GlobalID | ShapeSTAre | ShapeSTLen | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | Berkshire | None | 1970-01-01 | None | 1970-01-01 | {08FDA544-18B0-412A-B442-287E53E987F7} | 2.451015e+09 | 2.471530e+05 | POLYGON ((-8128884.676 5272654.345, -8128962.2... |
| 1 | 3 | Cape Cod | None | 1970-01-01 | None | 1970-01-01 | {B6CD90CF-2F7D-43F2-B251-FA7F8E00EF01} | 1.067067e+09 | 1.288227e+06 | MULTIPOLYGON (((-7813968.781 5173329.197, -781... |
Boundaries.plot()
plt.show()
#Extract all ZCTAs that are within the study area
#Read in MA outline
Outline = gpd.read_file('Data/MA_Outlines/OUTLINE25K_POLY.shp')
Outline
| AREA_ACRES | OBJECTID | SHAPE_AREA | SHAPE_LEN | geometry | |
|---|---|---|---|---|---|
| 0 | 4.794042e+06 | 1 | 1.940080e+10 | 2.457234e+06 | POLYGON ((262720.970 931706.880, 262726.090 93... |
| 1 | 6.400000e-02 | 2 | 2.589837e+02 | 6.441277e+01 | POLYGON ((247826.580 955016.130, 247812.530 95... |
| 2 | 1.756000e-01 | 3 | 7.104337e+02 | 1.160627e+02 | POLYGON ((249219.860 954550.130, 249209.020 95... |
| 3 | 4.402800e+00 | 4 | 1.781757e+04 | 6.075335e+02 | POLYGON ((248952.170 954025.880, 248958.480 95... |
| 4 | 6.554800e+00 | 5 | 2.652649e+04 | 7.402520e+02 | POLYGON ((248624.580 953915.130, 248580.140 95... |
| ... | ... | ... | ... | ... | ... |
| 913 | 1.546410e+01 | 909 | 6.258092e+04 | 2.040354e+03 | POLYGON ((300297.910 787312.190, 300309.660 78... |
| 914 | 9.980571e+02 | 910 | 4.038993e+06 | 1.612254e+04 | POLYGON ((304728.970 784910.810, 304737.380 78... |
| 915 | 1.028706e+02 | 912 | 4.163025e+05 | 3.924734e+03 | POLYGON ((307457.000 781867.940, 307466.560 78... |
| 916 | 6.175377e+02 | 914 | 2.499086e+06 | 7.740494e+03 | POLYGON ((257218.250 779297.880, 257223.020 77... |
| 917 | 3.035000e-01 | 918 | 1.228235e+03 | 1.828018e+02 | POLYGON ((256614.770 778306.630, 256624.940 77... |
918 rows × 5 columns
#Read in ZCTAs
ZCTA = gpd.read_file('Data/ZCTA/tl_2010_25_zcta510.shp')
ZCTA.head(2)
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 02536 | 2502536 | B5 | G6350 | S | 71696166 | 9066635 | +41.5969756 | -070.5678768 | N | POLYGON ((-70.59239 41.56006, -70.59268 41.559... |
| 1 | 25 | 02556 | 2502556 | B5 | G6350 | S | 10034104 | 1164445 | +41.6394454 | -070.6245149 | N | POLYGON ((-70.62389 41.61673, -70.62633 41.617... |
#We will clip Outline and ZCTA
#Check for projection
Outline.crs == ZCTA.crs
False
Outline.crs
<Projected CRS: EPSG:26986> Name: NAD83 / Massachusetts Mainland Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: United States (USA) - Massachusetts onshore - counties of Barnstable; Berkshire; Bristol; Essex; Franklin; Hampden; Hampshire; Middlesex; Norfolk; Plymouth; Suffolk; Worcester. - bounds: (-73.5, 41.46, -69.86, 42.89) Coordinate Operation: - name: SPCS83 Massachusetts Mainland zone (meters) - method: Lambert Conic Conformal (2SP) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
ZCTA.crs
<Geographic 2D CRS: EPSG:4269> Name: NAD83 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands. - bounds: (167.65071105957, 14.928194130078, -47.743430543984, 86.453745196084) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
#Change projection
ZCTA = ZCTA.to_crs("EPSG:26986")
Outline.crs == ZCTA.crs
True
#Clip
ZCTA_Outline = gpd.clip(ZCTA,Outline)
# Extract ZCTA whose centroids are within study area
#Check for projection
ZCTA_Outline.crs == Boundaries.crs
False
Boundaries.crs
<Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World between 85.06°S and 85.06°N. - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
#Change projection
ZCTA_U=ZCTA_Outline.to_crs("EPSG:3857")
ZCTA_U.crs == Boundaries.crs
True
# Extract all the ZCTAs whose centroid is within the Boston Region MPO boundary.
zcta_boston = ZCTA_U[ZCTA_U.centroid.within(Boundaries['geometry'][10])]
zcta_boston.head(2)
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | N | MULTIPOLYGON (((-7899944.411 5232206.578, -789... |
| 18 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | N | POLYGON ((-7897468.553 5233487.096, -7897513.7... |
#boston region
zcta_boston.plot()
plt.show()
After we have extracted ZCTAs in the study area, next step is to identify indicators. As far as I am conerned, for foreign-born STEM workers, they tend to have relatively higher salary. Comparing with other non-STEM immigrants, they have relatively higher salary and tend to be more care about the infrastructures/accessibility and environment within their neighborhoods. Hence, I pick landcover, hospitals, tree canopy, schools, MBTA_nodes and libraries as initial four indicators.
For each indicator, I will do the summary, visualization and reclassification such as calculating the value by zipcode, mapping the location, creating a choropleth map, ranking the area based on values.
I will start my analysis by landcover data to identify potential livable areas. Landcover is raster data.
# I will first read in the landcover data
with rasterio.open('Data/nlcd/NLCD_2016_Land_Cover_Boston.tif') as landcover:
landcover_shape = landcover.shape
landcover_transform = landcover.transform
landcover_res = landcover.res
landcover_crs = landcover.crs
landcover_nodata = landcover.nodata
landcover_bounds = landcover.bounds
landcover_1 = landcover.read(1)
#Visualization
show(landcover_1, transform = landcover_transform,cmap='Pastel1' )
plt.show()
#Create reclassification
landcover_reclass = np.full(landcover_shape, np.NaN)
#Reclassify
landcover_reclass[(landcover_1 == 23)] =5
landcover_reclass[(landcover_1 == 24)] = 4
landcover_reclass[(landcover_1 == 22)] = 3
landcover_reclass[(landcover_1 ==21)] = 2
landcover_reclass[(landcover_1 !=23) & (landcover_1 != 24)& (landcover_1 != 22)& (landcover_1 != 21)] = 1
landcover_reclass
array([[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]])
#Visualization after Reclassification
show(landcover_reclass,cmap='Blues',transform=landcover_transform)
plt.show()
I will start working on tree canopy as my second indicator which is an important component for the environment. Tree canopy is also raster data.
with rasterio.open('Data/nlcd/NLCD_2016_Tree_Canopy_Boston.tif') as Trees:
Trees_shape = Trees.shape
Trees_transform = Trees.transform
Trees_res = Trees.res
Trees_crs = Trees.crs
Trees_nodata = Trees.nodata
Trees_bounds = Trees.bounds
Trees_1 = Trees.read(1)
#Check if tree canopy has the same medata as landcover
with rasterio.open('Data/nlcd/NLCD_2016_Land_Cover_Boston.tif') as landcover:
landcover_meta = landcover.meta
with rasterio.open('Data/nlcd/NLCD_2016_Tree_Canopy_Boston.tif') as Trees:
Trees_meta = Trees.meta
Trees_meta ==landcover_meta
True
#Visualization
show(Trees_1, transform = Trees_transform,cmap='Greens')
plt.show()
Trees_1
array([[ 86, 88, 84, ..., 255, 255, 255],
[ 88, 89, 89, ..., 255, 255, 255],
[ 85, 89, 88, ..., 255, 255, 255],
...,
[ 85, 84, 89, ..., 0, 0, 0],
[ 89, 87, 87, ..., 0, 0, 0],
[ 89, 87, 86, ..., 0, 0, 0]], dtype=uint8)
Trees_reclass = np.full(Trees_shape, np.NaN)
#Reclassify
Trees_reclass[(Trees_1 >= 60)& (Trees_1 < 70)] =5
Trees_reclass[((Trees_1 >= 50)& (Trees_1 < 60))|((Trees_1 >= 70)& (Trees_1 < 80))] =4
Trees_reclass[((Trees_1 >= 40)& (Trees_1 < 50))|((Trees_1 >= 80)& (Trees_1 < 90))] =3
Trees_reclass[((Trees_1 >= 15)& (Trees_1 < 40))|(Trees_1 >= 90)] =2
Trees_reclass[Trees_1 < 15] =1
#Visualization after Reclassification
show(Trees_reclass,cmap='Greens',transform=landcover_transform)
plt.show()
After looking at two environment indicators, I will start looking at indicators that are related to infrastructures.
To start with, I will look at hospitals.
#I will start with hospitals
#Hospitals
Hospitals = gpd.read_file('Data/hospitals/HOSPITALS_PT.shp')
Hospitals.head(2)
| IDNUMBER | DPHID | NAME | SHORTNAME | ADDRESS | TOWN | GEOG_TOWN | ZIPCODE | CHIAREGION | TELEPHONE | ... | TAXSTATUS | BEDCOUNT | ER_STATUS | TRAUMA_ADU | TRAUMA_PED | SPEPUBFUND | FYE | MADID | EMSREGION | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2069 | 2069 | Beth Israel Deaconess Medical Center East | BIDMC East | 330 Brookline Avenue | Boston | BOSTON | 02215 | Metro Boston | (617) 667-7000 | ... | Non-profit | 248 | N | None | None | Not Applicable | 2017-09-30 | 35022096 | 4 | POINT (232494.091 898890.122) |
| 1 | 2085 | 2085 | Steward St. Elizabeth's Medical Center | St. Elizabeth's Medical Center | 736 Cambridge Street | Brighton | BOSTON | 02135 | Metro Boston | (617) 789-3000 | ... | For profit | 252 | Y | None | None | Not Applicable | 2017-12-31 | 35156980 | 4 | POINT (229003.958 899958.291) |
2 rows × 22 columns
type(Hospitals)
geopandas.geodataframe.GeoDataFrame
#Check for projection before join
zcta_boston.crs==Hospitals.crs
False
hospital=Hospitals.to_crs(zcta_boston.crs)
hospital.crs == zcta_boston.crs
True
#Join the two datasets
hospital_m =hospital.merge(zcta_boston,left_on='ZIPCODE', right_on='ZCTA5CE10')
hospital_m.head(2)
| IDNUMBER | DPHID | NAME | SHORTNAME | ADDRESS | TOWN | GEOG_TOWN | ZIPCODE | CHIAREGION | TELEPHONE | ... | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2069 | 2069 | Beth Israel Deaconess Medical Center East | BIDMC East | 330 Brookline Avenue | Boston | BOSTON | 02215 | Metro Boston | (617) 667-7000 | ... | 2502215 | B5 | G6350 | S | 1979040 | 280050 | +42.3475927 | -071.1029340 | N | POLYGON ((-7917617.107 5213795.544, -7917579.0... |
| 1 | 2335 | 2335 | Dana-Farber Cancer Institute | Dana-Farber Cancer Institute | 450 Brookline Avenue | Boston | BOSTON | 02215 | Metro Boston | (617) 632-3000 | ... | 2502215 | B5 | G6350 | S | 1979040 | 280050 | +42.3475927 | -071.1029340 | N | POLYGON ((-7917617.107 5213795.544, -7917579.0... |
2 rows × 34 columns
#Summary for number of hospitals per zipcode area
count_hospital = hospital_m['IDNUMBER'].groupby(hospital_m['ZIPCODE']).agg(['count','nunique'])
count_hospital
| count | nunique | |
|---|---|---|
| ZIPCODE | ||
| 01702 | 1 | 1 |
| 01742 | 1 | 1 |
| 01752 | 1 | 1 |
| 01757 | 1 | 1 |
| 01760 | 1 | 1 |
| 01890 | 1 | 1 |
| 01904 | 1 | 1 |
| 01915 | 1 | 1 |
| 01960 | 1 | 1 |
| 01970 | 1 | 1 |
| 02062 | 1 | 1 |
| 02111 | 1 | 1 |
| 02114 | 3 | 3 |
| 02115 | 2 | 2 |
| 02118 | 2 | 2 |
| 02120 | 1 | 1 |
| 02124 | 1 | 1 |
| 02130 | 1 | 1 |
| 02135 | 1 | 1 |
| 02138 | 1 | 1 |
| 02139 | 1 | 1 |
| 02149 | 1 | 1 |
| 02155 | 1 | 1 |
| 02176 | 1 | 1 |
| 02186 | 1 | 1 |
| 02190 | 1 | 1 |
| 02215 | 3 | 3 |
| 02462 | 1 | 1 |
#Map the location of each hospital
type(hospital_m)
pandas.core.frame.DataFrame
#Change to geopanda dataframe
from shapely.geometry import Point
from geopandas import GeoDataFrame
hospital_g = GeoDataFrame(hospital_m, geometry=hospital_m['geometry_x'])
hospital_g.plot(column = "IDNUMBER",ax=zcta_boston.plot())
<AxesSubplot:>
#make choropleth map
spatial_hos = zcta_boston.merge(count_hospital,left_on='ZCTA5CE10', right_on='ZIPCODE')
spatial_hos.head(2)
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | N | POLYGON ((-7897468.553 5233487.096, -7897513.7... | 1 | 1 |
| 1 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | N | MULTIPOLYGON (((-7890462.043 5249910.060, -788... | 1 | 1 |
zcta_boston.crs == spatial_hos.crs
True
Since count only shows results for zipcodes who have at least one hispotal, the joined dataset above does not include zipcodes areas that don't have hospitals. Hence I will do a spatial join to include all ZCTAs within the Boston region. For those zipcodes without hospital, the count will be shown as NaN.
#Spatial Join
hospital_join = gpd.sjoin(zcta_boston,spatial_hos, how='left', op='contains', lsuffix='left', rsuffix='right')
hospital_join.head()
| STATEFP10_left | ZCTA5CE10_left | GEOID10_left | CLASSFP10_left | MTFCC10_left | FUNCSTAT10_left | ALAND10_left | AWATER10_left | INTPTLAT10_left | INTPTLON10_left | ... | CLASSFP10_right | MTFCC10_right | FUNCSTAT10_right | ALAND10_right | AWATER10_right | INTPTLAT10_right | INTPTLON10_right | PARTFLG10_right | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 18 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | ... | B5 | G6350 | S | 11708211.0 | 1303900.0 | +42.4924563 | -070.9739297 | N | 1.0 | 1.0 |
| 21 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | ... | B5 | G6350 | S | 39091336.0 | 3958118.0 | +42.5702688 | -070.8669962 | N | 1.0 | 1.0 |
| 35 | 25 | 02462 | 2502462 | B5 | G6350 | S | 1369318 | 69749 | +42.3287076 | -071.2559002 | ... | B5 | G6350 | S | 1369318.0 | 69749.0 | +42.3287076 | -071.2559002 | N | 1.0 | 1.0 |
| 37 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | ... | B5 | G6350 | S | 38359306.0 | 2599042.0 | +42.2848223 | -071.3488109 | N | 1.0 | 1.0 |
5 rows × 26 columns
#Replace missing values with 0
hospital_join.fillna(0, inplace=True)
#Do the map
hospital_join.plot(column = 'count',legend=True)
<AxesSubplot:>
Next we will work on ZCTA ranking!
spatial_count = hospital_join[['ZCTA5CE10_left','count']]
spatial_count_1 = spatial_count.sort_values('count',ascending=False)
spatial_count_1.head(3)
| ZCTA5CE10_left | count | |
|---|---|---|
| 318 | 02114 | 3.0 |
| 522 | 02215 | 3.0 |
| 186 | 02115 | 2.0 |
#ZCTA with highest rank
spatial_count_highest = spatial_count_1[spatial_count_1['count']== 3.0]
spatial_count_highest
| ZCTA5CE10_left | count | |
|---|---|---|
| 318 | 02114 | 3.0 |
| 522 | 02215 | 3.0 |
#We want to replace the missing value with 0 so that we can take a look at counties without hospitals.
spatial_count_1.fillna(0, inplace=True)
spatial_count_1
| ZCTA5CE10_left | count | |
|---|---|---|
| 318 | 02114 | 3.0 |
| 522 | 02215 | 3.0 |
| 186 | 02115 | 2.0 |
| 53 | 02118 | 2.0 |
| 316 | 02124 | 1.0 |
| ... | ... | ... |
| 230 | 02460 | 0.0 |
| 231 | 02451 | 0.0 |
| 232 | 01731 | 0.0 |
| 233 | 01741 | 0.0 |
| 529 | 02151 | 0.0 |
164 rows × 2 columns
#ZCTA with lowest rank
spatial_count_lowest = spatial_count_1[spatial_count_1['count'] == 0]
spatial_count_lowest
| ZCTA5CE10_left | count | |
|---|---|---|
| 385 | 02056 | 0.0 |
| 384 | 02171 | 0.0 |
| 380 | 02019 | 0.0 |
| 383 | 02170 | 0.0 |
| 319 | 02108 | 0.0 |
| ... | ... | ... |
| 230 | 02460 | 0.0 |
| 231 | 02451 | 0.0 |
| 232 | 01731 | 0.0 |
| 233 | 01741 | 0.0 |
| 529 | 02151 | 0.0 |
136 rows × 2 columns
There are 136 zipcodes which don't have hospitals within the zipcode area.
I will rasterize these hospitals and calculate distance.
show(landcover_1, ax=Hospitals.plot(color='red'), transform = landcover_transform)
plt.show()
Hospitals.crs == landcover.crs
False
#Change projection to be the same with landcover
h_c = Hospitals.to_crs(landcover.crs)
h_c.crs==landcover.crs
True
#View the dataset after changing projection
h_c.head(2)
| IDNUMBER | DPHID | NAME | SHORTNAME | ADDRESS | TOWN | GEOG_TOWN | ZIPCODE | CHIAREGION | TELEPHONE | ... | TAXSTATUS | BEDCOUNT | ER_STATUS | TRAUMA_ADU | TRAUMA_PED | SPEPUBFUND | FYE | MADID | EMSREGION | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2069 | 2069 | Beth Israel Deaconess Medical Center East | BIDMC East | 330 Brookline Avenue | Boston | BOSTON | 02215 | Metro Boston | (617) 667-7000 | ... | Non-profit | 248 | N | None | None | Not Applicable | 2017-09-30 | 35022096 | 4 | POINT (232494.091 898890.122) |
| 1 | 2085 | 2085 | Steward St. Elizabeth's Medical Center | St. Elizabeth's Medical Center | 736 Cambridge Street | Brighton | BOSTON | 02135 | Metro Boston | (617) 789-3000 | ... | For profit | 252 | Y | None | None | Not Applicable | 2017-12-31 | 35156980 | 4 | POINT (229003.958 899958.291) |
2 rows × 22 columns
#change to raster
hospital_raster= features.rasterize(h_c['geometry'],
out_shape=landcover_shape, fill=1, transform=landcover_transform,
default_value=0,all_touched=True)
hospital_raster
array([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]], dtype=uint8)
show(hospital_raster, transform = landcover_transform)
plt.show()
It is reasonable since we are having points.
hos_distance = ndimage.distance_transform_edt(hospital_raster) * landcover_res[0]
show(hos_distance, transform=landcover_transform)
plt.show()
hos_distance
array([[26153.54851641, 26139.62509295, 26125.72869797, ...,
22531.29823157, 22550.12416817, 22568.97427886],
[26126.98604891, 26113.04846241, 26099.1379168 , ...,
22507.93859953, 22526.78405809, 22545.65368314],
[26100.43103092, 26086.4792565 , 26072.55453537, ...,
22484.59472617, 22503.45973401, 22522.34890059],
...,
[24067.40534416, 24047.67140494, 24027.9587148 , ...,
41457.56866967, 41486.00848479, 41514.45049618],
[24090. , 24070.28458494, 24050.59042934, ...,
41467.13035646, 41495.5636183 , 41523.99908487],
[24112.61080845, 24092.913896 , 24073.2382533 , ...,
41476.71153792, 41505.13823613, 41533.56714755]])
landcover_meta
{'driver': 'GTiff',
'dtype': 'uint8',
'nodata': None,
'width': 3322,
'height': 3024,
'count': 1,
'crs': CRS.from_epsg(6491),
'transform': Affine(30.0, 0.0, 181770.0,
0.0, -30.0, 948390.0)}
Since it is epsg 6491, the unit is meter. Note that 1609 meters is about 1 mile.
hos_reclass = np.full(landcover_shape, np.NaN)
hos_reclass[(hos_distance <= 3218 )] = 5
hos_reclass[(hos_distance > 3218) & (hos_distance <=8046)] = 4
hos_reclass[(hos_distance > 8046 ) & (hos_distance <=12872 )] = 3
hos_reclass[(hos_distance >12872 ) & (hos_distance <=24135 )] = 2
hos_reclass[hos_distance >24135] = 1
#Map after reclassify
show(hos_reclass, transform=landcover_transform, cmap='RdYlBu_r')
plt.show()
show(hos_reclass+landcover_reclass, transform = landcover_transform, cmap='Greens')
<AxesSubplot:>
School = gpd.read_file('Data/schools/SCHOOLS_PT.shp')
School.plot()
<AxesSubplot:>
#Check projection before merge
School.crs == landcover.crs
False
school=School.to_crs(zcta_boston.crs)
#Join the two datasets to get schools that are within Boston region
school_m =school.merge(zcta_boston,left_on='ZIPCODE', right_on='ZCTA5CE10')
school_m.head(2)
| SCHID | NAME | ADDRESS | TOWN_MAIL | TOWN | ZIPCODE | PHONE | GRADES | TYPE | TYPE_DESC | ... | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 08780605 | Tri-County Regional Vocational Technical High ... | 147 Pond Street | Franklin | FRANKLIN | 02038 | 508-528-5400 | 09,10,11,12 | VOC | Public Voc/Tech/Ag Reg'l HS | ... | 2502038 | B5 | G6350 | S | 68961360 | 1038205 | +42.0848585 | -071.4105714 | N | POLYGON ((-7945477.735 5178883.829, -7945313.8... |
| 1 | 01010030 | Oak Street Elementary School | 224 Oak Street | Franklin | FRANKLIN | 02038 | 508-541-7890 | K,01,02,03,04,05 | ELE | Public Elementary | ... | 2502038 | B5 | G6350 | S | 68961360 | 1038205 | +42.0848585 | -071.4105714 | N | POLYGON ((-7945477.735 5178883.829, -7945313.8... |
2 rows × 27 columns
type(school_m)
pandas.core.frame.DataFrame
from shapely.geometry import Point
from geopandas import GeoDataFrame
school_g = GeoDataFrame(school_m, geometry=school_m['geometry_x'])
school_g.plot(column = "SCHID",ax=zcta_boston.plot())
<AxesSubplot:>
It is clear that we can observe bad data here. We will do a spatial join instead.
#Spatial join
school_join = gpd.sjoin(school,zcta_boston,how='inner', op='within', lsuffix='left', rsuffix='right')
school_join.head()
| SCHID | NAME | ADDRESS | TOWN_MAIL | TOWN | ZIPCODE | PHONE | GRADES | TYPE | TYPE_DESC | ... | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 08780605 | Tri-County Regional Vocational Technical High ... | 147 Pond Street | Franklin | FRANKLIN | 02038 | 508-528-5400 | 09,10,11,12 | VOC | Public Voc/Tech/Ag Reg'l HS | ... | 02038 | 2502038 | B5 | G6350 | S | 68961360 | 1038205 | +42.0848585 | -071.4105714 | N |
| 134 | 01010030 | Oak Street Elementary School | 224 Oak Street | Franklin | FRANKLIN | 02038 | 508-541-7890 | K,01,02,03,04,05 | ELE | Public Elementary | ... | 02038 | 2502038 | B5 | G6350 | S | 68961360 | 1038205 | +42.0848585 | -071.4105714 | N |
| 178 | 01010032 | Parmenter School | 235 Wachusett Street | Franklin | FRANKLIN | 02038 | 508-541-5281 | K,01,02,03,04,05 | ELE | Public Elementary | ... | 02038 | 2502038 | B5 | G6350 | S | 68961360 | 1038205 | +42.0848585 | -071.4105714 | N |
| 192 | 01010310 | Remington Middle School | 628 Washington Street | Franklin | FRANKLIN | 02038 | 508-541-2130 | 06,07,08 | SEC | Public Secondary | ... | 02038 | 2502038 | B5 | G6350 | S | 68961360 | 1038205 | +42.0848585 | -071.4105714 | N |
| 269 | 01010035 | Davis Thayer School | 137 West Central Street | Franklin | FRANKLIN | 02038 | 508-541-5263 | K,01,02,03,04,05 | ELE | Public Elementary | ... | 02038 | 2502038 | B5 | G6350 | S | 68961360 | 1038205 | +42.0848585 | -071.4105714 | N |
5 rows × 27 columns
school_join.plot(ax=zcta_boston.plot(),color='red')
<AxesSubplot:>
#Summary for number of hospitals per zipcode area
count_schools = school_join['SCHID'].groupby(school_m['ZIPCODE']).agg(['count','nunique'])
count_schools
| count | nunique | |
|---|---|---|
| ZIPCODE | ||
| 01460 | 3 | 3 |
| 01701 | 5 | 5 |
| 01702 | 2 | 2 |
| 01720 | 3 | 3 |
| 01721 | 5 | 5 |
| ... | ... | ... |
| 02481 | 4 | 4 |
| 02482 | 4 | 4 |
| 02492 | 7 | 7 |
| 02493 | 2 | 2 |
| 02494 | 1 | 1 |
137 rows × 2 columns
#Join count with zcta_boston to get number of schools per zipcode
spatial_sch = zcta_boston.merge(count_schools,left_on='ZCTA5CE10', right_on='ZIPCODE')
spatial_sch.head(2)
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | N | MULTIPOLYGON (((-7899944.411 5232206.578, -789... | 6 | 6 |
| 1 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | N | POLYGON ((-7897468.553 5233487.096, -7897513.7... | 2 | 2 |
#Similarly with hospital, we use spatial join to include zipcodes which have 0 schools. Count will be NaN.
#Spatial Join
sch_join = gpd.sjoin(zcta_boston,spatial_sch, how='left', op='contains', lsuffix='left', rsuffix='right')
sch_join.head()
| STATEFP10_left | ZCTA5CE10_left | GEOID10_left | CLASSFP10_left | MTFCC10_left | FUNCSTAT10_left | ALAND10_left | AWATER10_left | INTPTLAT10_left | INTPTLON10_left | ... | CLASSFP10_right | MTFCC10_right | FUNCSTAT10_right | ALAND10_right | AWATER10_right | INTPTLAT10_right | INTPTLON10_right | PARTFLG10_right | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | ... | B5 | G6350 | S | 9219345.0 | 1195154.0 | +42.4659985 | -070.9757922 | N | 6.0 | 6.0 |
| 18 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | ... | B5 | G6350 | S | 11708211.0 | 1303900.0 | +42.4924563 | -070.9739297 | N | 2.0 | 2.0 |
| 21 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | ... | B5 | G6350 | S | 39091336.0 | 3958118.0 | +42.5702688 | -070.8669962 | N | 7.0 | 7.0 |
| 35 | 25 | 02462 | 2502462 | B5 | G6350 | S | 1369318 | 69749 | +42.3287076 | -071.2559002 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 37 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | ... | B5 | G6350 | S | 38359306.0 | 2599042.0 | +42.2848223 | -071.3488109 | N | 9.0 | 9.0 |
5 rows × 26 columns
sch_join.fillna(0, inplace=True)
sch_join.plot(column = 'count',legend=True)
<AxesSubplot:>
#ZCTA Ranking
count_school_order = sch_join.sort_values('count',ascending=False)
count_school_order
| STATEFP10_left | ZCTA5CE10_left | GEOID10_left | CLASSFP10_left | MTFCC10_left | FUNCSTAT10_left | ALAND10_left | AWATER10_left | INTPTLAT10_left | INTPTLON10_left | ... | CLASSFP10_right | MTFCC10_right | FUNCSTAT10_right | ALAND10_right | AWATER10_right | INTPTLAT10_right | INTPTLON10_right | PARTFLG10_right | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 109 | 25 | 01880 | 2501880 | B5 | G6350 | S | 17889529 | 1399492 | +42.5015278 | -071.0674996 | ... | B5 | G6350 | S | 17889529.0 | 1399492.0 | +42.5015278 | -071.0674996 | N | 10.0 | 10.0 |
| 37 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | ... | B5 | G6350 | S | 38359306.0 | 2599042.0 | +42.2848223 | -071.3488109 | N | 9.0 | 9.0 |
| 183 | 25 | 02130 | 2502130 | B5 | G6350 | S | 8635553 | 328448 | +42.3091743 | -071.1138354 | ... | B5 | G6350 | S | 8635553.0 | 328448.0 | +42.3091743 | -071.1138354 | N | 9.0 | 9.0 |
| 245 | 25 | 02135 | 2502135 | B5 | G6350 | S | 6824138 | 572755 | +42.3496885 | -071.1539638 | ... | B5 | G6350 | S | 6824138.0 | 572755.0 | +42.3496885 | -071.1539638 | N | 8.0 | 8.0 |
| 278 | 25 | 01923 | 2501923 | B5 | G6350 | S | 34194563 | 2348722 | +42.5741744 | -070.9505165 | ... | B5 | G6350 | S | 34194563.0 | 2348722.0 | +42.5741744 | -070.9505165 | N | 7.0 | 7.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 380 | 25 | 02019 | 2502019 | B5 | G6350 | S | 47523223 | 1424298 | +42.0766822 | -071.4744898 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 391 | 25 | 02111 | 2502111 | B5 | G6350 | S | 669967 | 56933 | +42.3505177 | -071.0590623 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 392 | 25 | 02210 | 2502210 | B5 | G6350 | S | 2399614 | 1404157 | +42.3477923 | -071.0395624 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 429 | 25 | 01944 | 2501944 | B5 | G6350 | S | 23902799 | 3207908 | +42.5766364 | -070.7671544 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 318 | 25 | 02114 | 2502114 | B5 | G6350 | S | 1164209 | 303495 | +42.3631745 | -071.0686463 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
164 rows × 26 columns
#ZCTA with highest rank
ZCTA_School_highest = count_school_order[count_school_order['count']== 10.0]
ZCTA_School_highest
| STATEFP10_left | ZCTA5CE10_left | GEOID10_left | CLASSFP10_left | MTFCC10_left | FUNCSTAT10_left | ALAND10_left | AWATER10_left | INTPTLAT10_left | INTPTLON10_left | ... | CLASSFP10_right | MTFCC10_right | FUNCSTAT10_right | ALAND10_right | AWATER10_right | INTPTLAT10_right | INTPTLON10_right | PARTFLG10_right | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 109 | 25 | 01880 | 2501880 | B5 | G6350 | S | 17889529 | 1399492 | +42.5015278 | -071.0674996 | ... | B5 | G6350 | S | 17889529.0 | 1399492.0 | +42.5015278 | -071.0674996 | N | 10.0 | 10.0 |
1 rows × 26 columns
01880 is the highest ZCTA ranking
#ZCTA with lowest rank
ZCTA_School_lowest = count_school_order[count_school_order['count']== 0.0]
ZCTA_School_lowest
| STATEFP10_left | ZCTA5CE10_left | GEOID10_left | CLASSFP10_left | MTFCC10_left | FUNCSTAT10_left | ALAND10_left | AWATER10_left | INTPTLAT10_left | INTPTLON10_left | ... | CLASSFP10_right | MTFCC10_right | FUNCSTAT10_right | ALAND10_right | AWATER10_right | INTPTLAT10_right | INTPTLON10_right | PARTFLG10_right | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 471 | 25 | 02203 | 2502203 | B5 | G6350 | S | 80317 | 0 | +42.3605978 | -071.0587753 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 35 | 25 | 02462 | 2502462 | B5 | G6350 | S | 1369318 | 69749 | +42.3287076 | -071.2559002 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 519 | 25 | 02464 | 2502464 | B5 | G6350 | S | 1463154 | 29494 | +42.3129751 | -071.2188818 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 523 | 25 | 02071 | 2502071 | B5 | G6350 | S | 2513156 | 33452 | +42.1034247 | -071.2735885 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 87 | 25 | 01966 | 2501966 | B5 | G6350 | S | 18106523 | 4167724 | +42.6407411 | -070.6202423 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 451 | 25 | 02142 | 2502142 | B5 | G6350 | S | 716170 | 577829 | +42.3614712 | -071.0819939 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 446 | 25 | 01719 | 2501719 | B5 | G6350 | S | 26640078 | 316293 | +42.4859849 | -071.5209848 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 114 | 25 | 02457 | 2502457 | B5 | G6350 | S | 413241 | 0 | +42.2993878 | -071.2742420 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 116 | 25 | 02126 | 2502126 | B5 | G6350 | S | 5392506 | 62719 | +42.2740159 | -071.0967897 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 118 | 25 | 02109 | 2502109 | B5 | G6350 | S | 449654 | 292691 | +42.3672234 | -071.0506328 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 127 | 25 | 01745 | 2501745 | B5 | G6350 | S | 1033541 | 3023 | +42.2930726 | -071.4991628 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 150 | 25 | 01901 | 2501901 | B5 | G6350 | S | 641320 | 68753 | +42.4607111 | -070.9460730 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 168 | 25 | 02461 | 2502461 | B5 | G6350 | S | 3767371 | 36499 | +42.3173616 | -071.2065077 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 52 | 25 | 02110 | 2502110 | B5 | G6350 | S | 479769 | 234496 | +42.3619623 | -071.0478465 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 51 | 25 | 02125 | 2502125 | B5 | G6350 | S | 5523791 | 799283 | +42.3157481 | -071.0558957 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 187 | 25 | 02163 | 2502163 | B5 | G6350 | S | 256727 | 32165 | +42.3661684 | -071.1228503 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 211 | 25 | 01929 | 2501929 | B5 | G6350 | S | 36187007 | 5125535 | +42.6344210 | -070.7765923 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 233 | 25 | 01741 | 2501741 | B5 | G6350 | S | 39536091 | 641968 | +42.5366203 | -071.3618321 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 234 | 25 | 01718 | 2501718 | B5 | G6350 | S | 121367 | 0 | +42.5198155 | -071.4292828 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 244 | 25 | 02047 | 2502047 | B5 | G6350 | S | 619346 | 21837 | +42.1337338 | -070.6858599 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 246 | 25 | 02199 | 2502199 | B5 | G6350 | S | 148951 | 0 | +42.3474633 | -071.0820392 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 319 | 25 | 02108 | 2502108 | B5 | G6350 | S | 354836 | 0 | +42.3577670 | -071.0648484 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 380 | 25 | 02019 | 2502019 | B5 | G6350 | S | 47523223 | 1424298 | +42.0766822 | -071.4744898 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 391 | 25 | 02111 | 2502111 | B5 | G6350 | S | 669967 | 56933 | +42.3505177 | -071.0590623 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 392 | 25 | 02210 | 2502210 | B5 | G6350 | S | 2399614 | 1404157 | +42.3477923 | -071.0395624 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 429 | 25 | 01944 | 2501944 | B5 | G6350 | S | 23902799 | 3207908 | +42.5766364 | -070.7671544 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
| 318 | 25 | 02114 | 2502114 | B5 | G6350 | S | 1164209 | 303495 | +42.3631745 | -071.0686463 | ... | 0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 |
27 rows × 26 columns
There are 27 ZCTA that don't have schools.
I will calculate the distance to schools
#Check projection
School.crs==landcover.crs
False
school_c = School.to_crs(landcover.crs)
#Raster
school_raster= features.rasterize(school_c['geometry'],
out_shape=landcover_shape, fill=1, transform=landcover_transform, default_value=0)
school_distance = ndimage.distance_transform_edt(school_raster) * landcover_res[0]
show(school_distance, transform=landcover_transform)
plt.show()
school_distance
array([[12377.17253657, 12373.86358418, 12370.6264999 , ...,
16703.77801577, 16718.72303736, 16733.70849513],
[12347.36004173, 12344.04309779, 12340.79819137, ...,
16677.76064105, 16692.72895605, 16707.73772837],
[12317.54845738, 12314.22348344, 12310.97071721, ...,
16651.75666409, 16666.74833313, 16681.78048051],
...,
[10954.97147418, 10950.49313958, 10946.09519418, ...,
14030.06058433, 14060.01778093, 14089.97515967],
[10984.63017129, 10980.1639332 , 10975.77787676, ...,
14031.69626239, 14061.64997431, 14091.60388316],
[11014.29071706, 11009.83651105, 11005.46228016, ...,
14033.39588268, 14063.34597455, 14093.29627873]])
#Change the reclassification!!!!!
school_reclass = np.full(landcover_shape, np.NaN)
school_reclass[school_distance <= 1609 ] = 5
school_reclass[(school_distance >1609 ) & (school_distance <=6436)] = 4
school_reclass[(school_distance >6436 ) & (school_distance <=12872 )] = 3
school_reclass[(school_distance >12872 ) & (school_distance <24135)] = 2
school_reclass[school_distance >24135 ] = 1
show(school_reclass, transform=landcover_transform, cmap='RdYlBu_r')
plt.show()
show(school_reclass+landcover_reclass,transform = landcover_transform, cmap='Greens')
<AxesSubplot:>
#MBTA
MBTA = gpd.read_file('Data/MBTA/MBTA_NODE.shp')
MBTA.head(2)
| STATION | LINE | TERMINUS | ROUTE | geometry | |
|---|---|---|---|---|---|
| 0 | Ashmont | RED | Y | A - Ashmont C - Alewife | POINT (236007.538 892693.023) |
| 1 | Harvard | RED | N | A - Ashmont B - Braintree C - Alewife | POINT (231387.274 902684.016) |
There is no zipcode here. We are not able to join with ZCTA and do a ZCTA ranking. We will go ahead to map the points and change it to raster.
#Change projection and prepare for mapping
mbta=MBTA.to_crs(zcta_boston.crs)
#map
mbta.plot(column = "STATION",ax=zcta_boston.plot())
<AxesSubplot:>
#prepare for transforming to raster
mbta.crs == landcover.crs
False
landcover.crs
CRS.from_epsg(6491)
mbta.crs
<Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World between 85.06°S and 85.06°N. - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
mbta_u=mbta.to_crs(landcover.crs)
#change to raster
MBTA_raster = features.rasterize(mbta_u['geometry'], out_shape=landcover_shape, fill=1, transform=landcover_transform, default_value=0)
MBTA_raster
array([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]], dtype=uint8)
MBTA_distance = ndimage.distance_transform_edt(MBTA_raster) * landcover_res[0]
show(MBTA_distance, transform=landcover_transform)
plt.show()
MBTA_distance
array([[63014.06271619, 62995.68556655, 62977.31734522, ...,
57119.19642292, 57139.95187257, 57160.71553086],
[62990.35640477, 62971.97233691, 62953.59719667, ...,
57097.53497306, 57118.29829398, 57139.06982092],
[62966.65546144, 62948.26447171, 62929.88240892, ...,
57075.88107073, 57096.65226614, 57117.43166495],
...,
[56194.49261271, 56173.88450161, 56153.28485494, ...,
48143.12100394, 48168.17414019, 48193.23292746],
[56216.29390132, 56195.6937852 , 56175.10213609, ...,
48159.63455011, 48184.67910031, 48209.72930851],
[56238.10274182, 56217.51061724, 56196.92696225, ...,
48176.1611173 , 48201.19708057, 48226.23870882]])
#reclassify
MBTA_reclass = np.full(landcover_shape, np.NaN)
MBTA_reclass[(MBTA_distance <=3218 )] = 5
MBTA_reclass[(MBTA_distance >3218 ) & (MBTA_distance <= 8046)] = 4
MBTA_reclass[(MBTA_distance > 8046 ) & (MBTA_distance <= 12872)] = 3
MBTA_reclass[(MBTA_distance >12872 ) & (MBTA_distance < 24135 )] = 2
MBTA_reclass[(MBTA_distance > 24135 )] = 1
#Visualization for raster
show(MBTA_reclass, transform=landcover_transform, cmap='RdYlBu_r')
plt.show()
Library = gpd.read_file('Data/library/LIBRARIES_PT.shp')
library=Library.to_crs(zcta_boston.crs)
library.head(2)
| NAME | ADDRESS | TOWN | STATE | ZIP | TYPE | ADDRESS_PO | LIBKEY | NETWORK | REGION | DELIVROUTE | WEBSITE | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | EGREMONT FREE LIBRARY | 1 BUTTONBALL LANE | EGREMONT | MA | 01230 | PUBLIC | M_42081_880811 | PU-EGREMONT-FREE | C/WMARS | Western | Optima | http://egremont-ma.gov/ | POINT (-8172091.151 5185248.592) |
| 1 | FLORIDA FREE PUBLIC LIBRARY | 56 NORTH COUNTY ROAD | FLORIDA | MA | 01247 | PUBLIC | M_73999_939186 | PU-FLORIDA-FREE | C/WMARS | Western | Optima | None | POINT (-8130519.445 5265364.956) |
#Join the two datasets get libraries within ZCTA Boston
library_m =library.merge(zcta_boston,left_on='ZIP', right_on='ZCTA5CE10')
library_m.head(2)
| NAME | ADDRESS | TOWN | STATE | ZIP | TYPE | ADDRESS_PO | LIBKEY | NETWORK | REGION | ... | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ROBBINS LIBRARY | 700 MASSACHUSETTS AVENUE | ARLINGTON | MA | 02476 | PUBLIC | M_228383_907296 | PU-ARLINGTO-ROBBINS | MLN | Metrowest | ... | 2502476 | B5 | G6350 | S | 5422176 | 191590 | +42.4156366 | -071.1756700 | N | POLYGON ((-7923039.362 5224892.512, -7922688.8... |
| 1 | BACON FREE LIBRARY | 58 ELIOT STREET | NATICK | MA | 01760 | PUBLIC | M_215167_891319 | PU-NATICK-BACON | MLN | Metrowest | ... | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | N | POLYGON ((-7948367.918 5204315.595, -7948371.4... |
2 rows × 25 columns
#Change to geodataframe and prepare for mapping
from shapely.geometry import Point
from geopandas import GeoDataFrame
library_g = GeoDataFrame(library_m, geometry=library_m['geometry_x'])
#Map libraries
library_g.plot(column = "NAME",ax=zcta_boston.plot())
<AxesSubplot:>
#Summary for number of libraries per zipcode area
count_libraries = library_m['NAME'].groupby(library_m['ZIP']).agg(['count','nunique'])
count_libraries
| count | nunique | |
|---|---|---|
| ZIP | ||
| 01460 | 1 | 1 |
| 01701 | 1 | 1 |
| 01702 | 1 | 1 |
| 01719 | 1 | 1 |
| 01720 | 2 | 2 |
| ... | ... | ... |
| 02478 | 1 | 1 |
| 02481 | 1 | 1 |
| 02482 | 2 | 2 |
| 02493 | 1 | 1 |
| 02494 | 1 | 1 |
130 rows × 2 columns
# Join count with zcta_boston so we can know number of libraries within each zcta.
spatial_l = zcta_boston.merge(count_libraries,left_on='ZCTA5CE10', right_on='ZIP')
spatial_l.head(2)
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | N | MULTIPOLYGON (((-7890462.043 5249910.060, -788... | 2 | 2 |
| 1 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | N | POLYGON ((-7948367.918 5204315.595, -7948371.4... | 2 | 2 |
#Use spatial join to include count equals to NaN. So we have an idea of each ZCTA, including ZCTA without a library.
l_join = gpd.sjoin(zcta_boston,spatial_l, how='left', op='contains', lsuffix='left', rsuffix='right')
l_join.head()
| STATEFP10_left | ZCTA5CE10_left | GEOID10_left | CLASSFP10_left | MTFCC10_left | FUNCSTAT10_left | ALAND10_left | AWATER10_left | INTPTLAT10_left | INTPTLON10_left | ... | CLASSFP10_right | MTFCC10_right | FUNCSTAT10_right | ALAND10_right | AWATER10_right | INTPTLAT10_right | INTPTLON10_right | PARTFLG10_right | count | nunique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 18 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 21 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | ... | B5 | G6350 | S | 39091336.0 | 3958118.0 | +42.5702688 | -070.8669962 | N | 2.0 | 2.0 |
| 35 | 25 | 02462 | 2502462 | B5 | G6350 | S | 1369318 | 69749 | +42.3287076 | -071.2559002 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 37 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | ... | B5 | G6350 | S | 38359306.0 | 2599042.0 | +42.2848223 | -071.3488109 | N | 2.0 | 2.0 |
5 rows × 26 columns
#Replace all NaN with 0
l_join.fillna(0, inplace=True)
#Choropleth map
#Number of libraries in each ZCTA
l_join.plot(column = 'count',legend=True)
<AxesSubplot:>
I will calculate distance to libraries
#Change and check the projection
Library.crs==landcover.crs
False
#Change projection
library_c = Library.to_crs(landcover.crs)
#Change to raster
l_raster = features.rasterize(library_c['geometry'], out_shape=landcover_shape, fill=1, transform=landcover_transform, default_value=0)
#Calculate the distance
l_distance = ndimage.distance_transform_edt(l_raster) * landcover_res[0]
l_distance
array([[12303.65799265, 12299.45120727, 12295.31618138, ...,
16567.22668403, 16583.59731783, 16600.0060241 ],
[12273.9602411 , 12269.7432736 , 12265.59823245, ...,
16542.08269838, 16558.47819094, 16574.91176447],
[12244.26396318, 12240.03676465, 12235.8816601 , ...,
16516.95492517, 16533.37533597, 16549.83383602],
...,
[11412.78230757, 11411.40219254, 11410.10078834, ...,
16318.29035163, 16343.50329642, 16368.73238831],
[11442.74879563, 11441.37229531, 11440.07430046, ...,
16334.5798844 , 16359.76772451, 16384.97177294],
[11472.71545886, 11471.34255438, 11470.04795108, ...,
16350.90823165, 16376.07095735, 16401.24995237]])
#Reclassify
library_reclass = np.full(landcover_shape, np.NaN)
library_reclass[(l_distance <=3218 )] = 5
library_reclass[(l_distance >3218 ) & (l_distance <= 8046)] = 4
library_reclass[(l_distance > 8046 ) & (l_distance <= 12872)] = 3
library_reclass[(l_distance >12872 ) & (l_distance < 24135 )] = 2
library_reclass[(l_distance > 24135 )] = 1
# Map reclassification
show(library_reclass, transform=landcover_transform, cmap='RdYlBu_r')
plt.show()
After finish reclassifying all the indicators, I will calculate the weighted suitable raster.I will start my suitability analysis here.
#Calculate weighted rasters
weighted_suitability = landcover_reclass*0.10 + Trees_reclass*0.15 + hos_reclass*0.3 + school_reclass*0.15 + MBTA_reclass*0.2 + library_reclass*0.10
#Unweighted
unweighted_suitability = landcover_reclass + Trees_reclass + hos_reclass + school_reclass + MBTA_reclass + library_reclass
show(weighted_suitability)
<AxesSubplot:>
show(unweighted_suitability)
<AxesSubplot:>
We can observe difference between weighted suitability and unweighted suitability.
#Check projection
zcta_boston.crs == landcover.crs
False
zcta_crs=zcta_boston.to_crs(landcover.crs)
#Copy
suitability_map = weighted_suitability.copy()
boundaries_raster = features.rasterize(zcta_crs['geometry'], out_shape=landcover_shape,
fill=1, transform=landcover_transform, default_value=0)
show(boundaries_raster)
<AxesSubplot:>
#suitability within the boundaries
suitability_map[boundaries_raster == 1] = np.NaN
show(suitability_map, cmap='RdYlBu_r',transform=landcover_transform)
plt.show()
#Zonal Statistics
#Visualization
ax = zcta_crs.plot(facecolor='None', edgecolor='black',zorder=1)
show(suitability_map, ax=ax,cmap='RdYlBu_r',transform=landcover_transform)
plt.show()
stat_1=zonal_stats(zcta_crs, suitability_map, affine=landcover_transform, stats=['mean', 'count'], nodata = np.NaN)
stat_1[:2]
[{'mean': 3.9435350377046166, 'count': 10874},
{'mean': 3.9416464891041163, 'count': 14455}]
stat_2=pd.DataFrame(stat_1)
stats = pd.DataFrame(zonal_stats(zcta_boston, suitability_map, affine=landcover_transform, stats=['mean', 'count'], nodata = np.NaN))
len(list(stat_2['mean']))
164
zcta_boston.head(2)
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | N | MULTIPOLYGON (((-7899944.411 5232206.578, -789... |
| 18 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | N | POLYGON ((-7897468.553 5233487.096, -7897513.7... |
zones_stats = zcta_crs.copy()
Since the index mismatch between the two, missing values of mean value happen. I have to reorder the index.
zones_stats.reset_index(drop=True, inplace=True)
zones_stats['mean']=stat_2['mean']
zones_stats
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | N | MULTIPOLYGON (((243877.747 913824.569, 243872.... | 3.943535 |
| 1 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | N | POLYGON ((245700.479 914778.626, 245667.407 91... | 3.941646 |
| 2 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | N | MULTIPOLYGON (((250787.754 926889.283, 251155.... | 3.698648 |
| 3 | 25 | 02462 | 2502462 | B5 | G6350 | S | 1369318 | 69749 | +42.3287076 | -071.2559002 | N | MULTIPOLYGON (((222057.985 897313.444, 222020.... | 4.431644 |
| 4 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | N | POLYGON ((208131.465 893134.440, 208128.810 89... | 3.906878 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 159 | 25 | 02343 | 2502343 | B5 | G6350 | S | 18146751 | 280250 | +42.1489129 | -071.0037371 | N | POLYGON ((243078.694 878153.346, 243068.777 87... | 3.832046 |
| 160 | 25 | 02129 | 2502129 | B5 | G6350 | S | 3492181 | 1266666 | +42.3796570 | -071.0614875 | N | MULTIPOLYGON (((235381.167 902518.306, 235374.... | 4.307002 |
| 161 | 25 | 02128 | 2502128 | B5 | G6350 | S | 12561059 | 2952872 | +42.3611289 | -071.0069754 | N | MULTIPOLYGON (((240204.456 905138.055, 240343.... | 4.022415 |
| 162 | 25 | 02122 | 2502122 | B5 | G6350 | S | 5263269 | 1641767 | +42.2914125 | -071.0421575 | N | MULTIPOLYGON (((237311.340 892366.805, 237300.... | 4.157824 |
| 163 | 25 | 02151 | 2502151 | B5 | G6350 | S | 15190546 | 3291332 | +42.4182938 | -071.0012566 | N | MULTIPOLYGON (((240204.456 905138.055, 240184.... | 4.122789 |
164 rows × 13 columns
No missing values!
#Report zcta with highest suitability score
zones_stats.sort_values('mean',ascending=False).head(1)
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 37 | 25 | 02468 | 2502468 | B5 | G6350 | S | 3833292 | 57129 | +42.3285534 | -071.2295534 | N | POLYGON ((223199.541 898594.497, 223210.181 89... | 4.495862 |
#Report zcta with lowest suitability score
zones_stats.sort_values('mean',ascending=True).head(1)
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 72 | 25 | 02047 | 2502047 | B5 | G6350 | S | 619346 | 21837 | +42.1337338 | -070.6858599 | N | POLYGON ((267394.941 875836.097, 267422.018 87... | 2.323127 |
zones_stats.plot(column = 'mean', cmap='RdYlBu_r', legend=True)
plt.show()
Next I will do an interactive map to better show the suitability map
mymap = folium.Map(location=[42.3601, -71.0589], zoom_start=9,tiles=None)
folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(mymap)
<folium.raster_layers.TileLayer at 0x7ff03c23b160>
mymap.choropleth(
geo_data=zones_stats.to_crs('epsg:4326'),
name='Choropleth',
data=zones_stats,
columns=['GEOID10','mean'],
key_on="feature.properties.GEOID10",
fill_color='RdYlBu_r',
fill_opacity=1,
line_opacity=0.2,
legend_name='Suitability score',
smooth_factor=0
)
mymap